BiRe-ID: Binary Neural Network for Efficient Person Re-ID
153
Then, we use efficient XNOR and Bit-count operations to replace real-valued operations.
Following [199], the forward process of the BNN is
ai = bai−1 ⊙bwi,
(6.7)
where ⊙represents efficient XNOR and Bit-count operations. Based on XNOR-Net, we
introduce a learnable channel-wise scale factor to modulate the amplitude of real-valued
convolution. Aligned with the Batch Normalization (BN) and activation layers, the 1-bit
convolution is formulated as
bai = sign(Φ(αi ◦bai−1 ⊙bwi)).
(6.8)
In KR-GAL, the original output feature ai is first scaled by a channel-wise scale factor
(vector) αi ∈RCi to modulate the amplitude of the real-valued counterparts. It then enters
Φ(·), which represents a composite function built by stacking several layers, e.g., BN layer,
non-linear activation layer, and max pool layer. The output is then binarized to obtain the
binary activations bai ∈Bni, using the sign function. sign(·) denotes the sign function that
returns +1 if the input is greater than zeros and −1 otherwise. Then, the 1-bit activation
bai can be used for the efficient XNOR and Bit-count of (i + 1)-th layer.
However, the gap in representational capability between wi and bwi could lead to a
large quantization error. We aim to minimize this performance gap to reduce the quan-
tization error while increasing the binarized kernels’ ability to provide information gains.
Therefore, αi is also used to reconstruct bwi into wi. This learnable scale factor can lead to
a novel learning process with more precise estimation of convolutional filters by minimizing
a novel adversarial loss. Discriminators D(·) with weights WD are introduced to distinguish
unbinarized kernels wi from reconstructed ones αi ◦bwi. Therefore, αi and WD are learned
by solving the following optimization problem.
arg
min
wi,bwi,αi max
WD L K
Adv(wi, bwi, αi, WD) + L K
MSE(wi, bwi, αi) ∀i ∈N,
(6.9)
where L K
Adv(wi, bwi, αi, WD) is the adversarial loss as
L K
Adv(wi, bwi, αi, WD) = log(D(wi; WD)) + log(1 −D(bwi ◦αi; WD)),
(6.10)
where D(·) consists of several basic blocks, each with a fully connected layer and a
LeakyReLU layer. In addition, we employ discriminators to refine every binarized con-
volution layer during the binarization training process.
Furthermore, LMSE(wi, bwi, αi) is the kernel loss between the learned real-valued filters
wi and the binarized filters bwi, which is expressed by MSE as
L K
MSE(wi, bwi, αi) = λ
2 ||wi −αi ◦bwi||2
2,
(6.11)
where MSE is used to balance the gap between real value wi and binarized bwi. λ is a
balance hyperparameter.
6.2.3
Feature Refining Generative Adversarial Learning (FR-GAL)
We introduce generative adversarial learning (GAL) to refine the low-level characteristic
through self-supervision. We employ the high-level feature with abundant semantic infor-
mation aH ∈RmH to supervise the low-level feature aL ∈RmL, where mH = CH ·WH ·HH
and mL = CL · WL · HL. To keep the channel dimension identical, we first employ a 1 × 1
convolution to reduce CH to CL as
a∗
H = f(W1×1 ⊗aH),
(6.12)